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STATEMENT OF FOCUS 



The Wisconsin Research and Development Center for Cognitive Learning 
focuses on contributing to a better understanding of cognitive learning by 
children and youth and to the improvement of related educational practices. 

The strategy for research and development is comprehensive, it includes 
basic research to generate new knowledge about the conditions and processes 
of learning and about the processes of instruction, and the subsequent devel- 
opment of research-based instructional materials, many of which are designed 
for use by teachers and others for use by students. These materials are tested 
and refined in school settings. Throughout those operations behavioral scien- 
tists, curriculum exports, academic scholars, and school people interact, in- 
suring that the results of Center activities are based soundly on knowledge of 
subject matter and cognitive learning and that they a^e applied to the improve- 
ment of educational practice. 

This Technical Report is from Phase 2 of the Project on Prototypic 
Instructional Systems in Element My Mathematics in Program l . General 
objectives of the Program are to establish rationale and strategy for de- 
veloping instructional systems, to identity sequences of concepts and 
cognitive skills, to develop assessment procedures for these concepts 
and skills, to identify or develop instructional materials associated with 
the concepts and cognitive skills, ana to generate new knowledge about 
instructional procedures. Contributing to the Program objectives, the 
Mathematics Project, rhase J, is developing and testing a televised 
course in arithmetic for Grades 1-7 which provides not only a complete 
program of instruction for the pupils but also inservice training for 
teachers. Phase l has a long-term goal of providing an individujlly guided 
instructional program in elementary mathematics. Preliminary activities 
include identifying instructional objectives, student activities, teacher 
activities, materials, end assessment procedures for integration into a 
total mathematics curriculum. The third phase focuses on the development 
of a computer system for managing individually guided instruction in mathe- 
matics and on a later extension of the system's applicability. 
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ABSTRACT 



Thl? Technical Report presents the results of three experiments designed 
to study the utility of probability measurement procedures with mathematics 
test Items* In each experiment It was hypothesized that: 

1. The use of a probability measurement procedure introduces a 
test-taxing style which changes the performance being measured* 

2* Probability measurement procedures will yield a higher reliability 
coefficient than standard scoring procedures. 

3* The mean score obtained by probability measurement procedures 
for the same students scored In the standard way which, In turn, 
will be greater than the means of students In the control group 
who take the test under standard conditions* 

4. The reliability coefficients will be ordered in the same manner 
as the means* 

In each study these hypotheses were not confirmed* The first two 
studies used test Items measuring high level cognitive abilities with 
Eleventh and Twelfth Grade students. The third used Information Items 
measuring low cognitive abilities with Eighth Grade students. 




\ 







I 

INTRODUCTION 



This paper outlines some characteristics 
of probability measurement procedures for 
scoring objective teste, discusses hypothe- 
sized advantages and disadvantages of the 
methods, and reports the results of three ex- 
periments designed to learn moreabout the 
technique and compare it with standard pro- 
cedures of scoring objective tests. 

In many testing situations a student is 
presented a multiple-choice item in which he 
is asked to decide which of the given alterna- 
tives is correct, or the best. The item is 
scored 1 or 0 depending on whether his 
answer corresponds to that on the key or not, 
regardless of the student's confidence in his 
response. Tests comprised of difficult Items 
such as tests constructed to measure problem 
solving, insightful, or creative cognitive be- 
haviors generally produce low reliabilities 
using the standard test-taking and scoring 
procedures. The Initial purpose of the studies 
reported here was to see if a non-standard 
test-taking and scoring procedure would pro- 
vide useful, reliable information for such a 
test. 

The test-taking procedure used asks the 
student to specify a degree of belief proba- 
bility for each of the given alternatives. That 
is, the student is presented a multiple-choice 
item, with five choices, and asked to specify 
what he believes to be the probability of cor- 
rectness of each choice. The total of the 
probabilities for the five choices should be 1. 

This procedure was proposed In an article 
by Shuford (1965) who called it an 1 admissible 
scoring procedure' 1 and claimed it to be a 
more sensitive instrument to partial knowledge. 

Any admissible probability measurement 
procedure has a scoring system vrtilch 
guarantees that any student, at whatever 
level of knowledge or skill, can maximize 
his expected score if and only if he follows 
instructions and honestly reflects his 



'degree-of-uelief probability' as to the cor- 
rectness of a possible answer to the test 
item. [Shuford defines testing procedures 
which utilize such scoring systems as ad- 
missible probability measurement procedures. ) 
These degrees-of-bellef probabilities con- 
tain all the information that can be made 
available about the student's knowledge 
structure ns a consequence of asking the 
particular question under consideration. Fy 
way of contrast, multiple-choice and con- 
structed-response procedures can yield only 
partial information as to whether or not these 
probabilities exceed certain values or lie 
within a »ery broad range. (Shuford, 1965, 

P. 2) 

The notion of using degree *ol-belief proba- 
bilities is not new in educational literature. 
However, little seems to have been done except 
to periodically re-discover it and postulate its 
utility until the Italian probabilist De Finetti 
(1965) reopened the topic with a comprehensive 
theoretical treatment. This was quickly fol- 
lowed by a careful treatment of scoring pro- 
cedures associated with degree-of-belief test- 
taking (Albert, Massengill, 6 Shuford, 1966). 

In the meantime several empirical studies 
have been reported. Ahlgren (1969) summarizes 
the results of recent research in this area and 
reports that in 26 out of 51 studies an increase 
in reliability was obtained by using confidence 
sccring studies rather than standard scoring. 
However, other than the studies reported here, 
nonv dealt with mathematics items. 

Wilson (196*1 observed that attempts to 
measure ’ insightful mathematical ability’’ were 
rather unfruitful in spite of considerable feeling 
among mathematicians that this is an Important 
mathematical ability. Instruments developed 
for the National Longitudinal Study of Mathe- 
matical Abilities (NLSMA) to detect insightful- 
ness were considered to be poor. One possible 
reason for this was that the tests were too 
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insensitive* Yet, the mathematicians respon- 
sible for developing the tests for NLSMA still 
wanted insightful scales to be included in 
the Longitudinal Study. Three scales totalling 
31 items were administered to Eleventh Year 
students in Spring 1964 { NLSMA, 1968). 

Upon analyses of the data, the scale reliabili- 
ties were quite low. At that time, Wilson 
(1965) hypothesized that using admissible 
scoring procedures on this type ot test would 
yield higher means and higher reliability co- 
efficients. 

The advantages of such procedures stem 
from the fact that degree-of-bolief probabili- 
ties contain all of the information that can be 
made available about this student's knowledge 
structure as a consequence of asking the par- 
ticular question under consideration. Specific 
advantages would include: 

(1) Higher reliabilities. For example, 

Shuford (1965) reported increases in 
split-half reliabilities from .6 or .7 t. 
e r ound .9 when probability measurement 
procedures were used rather than standard 
scoring procedures. This could be ex- 
pected since the probability measure- 
ment procedure would produce scores 
with a smaller fraction of chance be- 
havior than the standard scoring pro- 
cedure. Shuford also argued that in- 
creased lellabiiities would be found in 
almost all testing situations encountered 
in practice if one used an ''admissible 
probability measurement procedure. " 

(2) Better piediction and higher validity. 
These could be expected since corre- 
lations and validities are limited by 
test reliabilities. 

(3) More sensitive item analysis. An item- 
analysis technique based on the exam- 
ination of the patterns of probabilities 
assigned to a given item by a population 
should be very sensitive. 

The most obvious disadvantage for the use erf 
a probability measurement procedure is that 
students must be trained, or instructed, to 
follow the probability assignment procedure 
and convinced that maximum score can be 
expected if, and only if, it is followed. An- 
other disadvantage is the greater cost In time 
and materials. It takes longer for the student 
to assign probabilities to each of five pos- 
sible choices than to pick one choice as the 
best. 

*n addition to the different test-taking 
characteristics, various scoring procedures 
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are possible, 

Four scoring methods were used in the ex- 
* ‘riments reported in this caper. For the con- 
tul groups: 

(1) Standard scoring (0, 1) and summing the 
correct choices were used. For the treat- 
ment groups: 

(2) Summing the probability weignts on the 
correct choices, 

(3) Transforming the data by a spherical 
scoring function, and 

(4) Transforming the data by a logarithmic 
transfonnation were used. 

The last two scoring procedures are examples 
of what Albert, et aj . (1966, p. 127), have 
called reproducing scoring systems. These 
two transformations are scoring systems which 
are a part of test procedures which have been 
referred to as ‘'admissible Probability measure- 
ment procedures. " 

The spherical scoring function applied to 
each item is: 




where rj is the probability weight assigned 
to the jtn alternative and r* is the correct 
choice for the item. What this transformation 
does to a sec e on an item where the choice 
(a) is correct is illustrated in Figure 1. 

The score tor an item is strictly determined 
by the probability assigned to the correct 
answer and the way In which the student's un- 
certainty is distributed over the other answers 
(i. e. , the relative magnitudes of the other as- 
signed probabilities). The order of distribut- 
ing these weights is of no importance. For 
instance, Subjects (3) and (4) have the same 
transformed score (.29) (since the magnitudes 
of the other four alternatives are the same). 

Albert, et a)> (1966), refer to the truncated 
logarithmic scoring system as not being 
strictly a reproducing scoring system, but hav- 
ing the reproducing property for values of £ 
between .027 and .973. They recommend this 
procedure be followed for practical purposes, 
since it is likely that the effect resulting from 
the truncation at £ = .01 is quite acceptable. 
The truncated logarithmic scoting function 
is: 



Figure 1 

Spherical Scoring Function Applied to the Responses 
of Six Subjects to the Same Item 
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. 45 
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. 2 
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. 29 


(4) 


. 2 
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0 
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. 29 


(5) 


. 7 


. 3 


0 
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.92 


(6) 


. 3 


. 7 


0 


0 


0 


. 40 



1 + log r^ for .01 < r^ < 1 

- 1 for 0 < r, <.01 

where r^ is the probability weight assigned 
to the correct choice. This is the only repro- 
ducing scoring system that depends only on 
the probability weight that the subject assigns 
to the correct choice. The range of scores 
assigned to an item is between -1 and 1* This 
transformation is particularly hard on misinfor- 
mation in that one receives a score of -1 on an 
item i'or assigning 0 to the correct choice. 
Figure 2 illustrates what the logarithmic; trans- 
formation does to the weights the subject 
places on the correct choice. 



Figure 2 
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From this background the following hypotheses 
were proposed: 

Hypothesis 1 . The use of a probability mea- 
surement procedure introduces a tcst-taking< 
style which changes the performance being 
measured. 

It was decided to examine this hypothesis 
by examining the percentage of responses 



in three categories: (1,0) or right-wrong re- 
sponses, (. .2, .2, ,2, .2) or guessing 

responses, and other responses. If subjects 
are using degree-of-belief probabilities the 
percentage of other responses should be large 
in comparison to the other categories. 

Hypothesis 2, Probability measurement pro- 
cedures will yield a higher reliability coeffi- 
cient than standard scoring procedures. 

This hypothesis was to be examined by 
putting 90% confidence intervals around the 
coefficient (Hoyt, 1941) and seeing if the 
intervals overlap (Feldt, 1965). 

Hypothesis 3, The mean score obtained by 
probability measurement procedures for the 
treatment group will be greater than the mean 
score for the same students scored in the 
standard way which, in turn, will be greater 
than the means of students in the control 
group who take the test under standard con- 
ditions. 

This hypothesis was to be tested by simply 
ordering the means and rejecting the hypothesis 
if the means are not ordered as hypothesized. 
The spherical transformation on the treatment 
group scores should produce higher means 
than the original means; and the logarithmic 
transformation, lower means. 

Hypothesis 4. The reliability coefficients 
will be ordered in the same manner as the 
means. 

The four coefficients will be examined and 
the hypothesis will be rejected if the ordering 
is not as specified by the hypothesis. 

In order to examine the plausibility of these 
hypotheses, three experiments were conducted 
using students from James Madison Memorial 
High School in Madison, Wisconsin. The 
first involved Twelfth Graders; the second, 
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Eleventh Graders; and the third, Eighth 
Graders, The first two studies used a test 
derived from selected Items from the NLSMA 
'Insightful scales. " The third study used a 
geometry Information test, also derived from 
the NLSMA battery. 



STUDY NO. I 

The first experiment involving students 
taking Twelfth Year mathematics was con- 
ducted In Fall 1967. Using a stratified ran- 
dom assignment procedure, 32 subjects were 
assigned to the treatment group and 32 sub- 
jects to the control group. Blocking was done 
on grade, sex, previous mathematics, grade, 
and 1,0. The subjects assigned to the treat- 
ment group met immediately before the test 
for 15 minutes to learn the probability scor- 
ing procedure. Using an overhead projector, 
the students In thic session were presented 
sample multiple-choice Items with five al- 
ternatives (Appendix B). For each Item they 
were asked to specify their beliefs as to the 
probability of correctness of each alternative 
where the sum of the probabilities for the five 
choices is 1. The students were instructed 
that they could maximize their scores if they 
honestly reflected their degree-of-belief prob- 
abilities as to the correctness of each of the 
choices for an item. The control group was 
Instructed to take this test in the usu il man- 
ner. The testing time for both groups on a 
15-item test was 49 minutes. The items were 
selected from insightful items included in the 
NLSMA battery (Appendix A). The results cf 
this study are summarized in Table 1. 

The first hypothesis was only partially 
substantiated since students in the Twelfth- 



Grade treatment group used (1, 0) scoring 50% 
of the time and guessing (. 2, .2, .2, .2, ,2) 
14% of the time. Hence, the students used a 
different strategy on only 36% of the questions. 

Hypotheses 2, 3, and 4 are not supported 
by the data. The differences between sum 
scores for the treatment and control groups 
were negligible. The magnitudes of the 
means and the reliabilities are very similar. 

So similar, in fact, that no confidence inter- 
vals were calculated for the reliabilities. 
However, the variance was reduced for the 
treatment group. 

The transformed data for the treatment 
group produce conflicting information with the 
hypotheses. As expected, the spherical trans- 
formation produced a higher mean. However, 
the transformation had the opposite effect 
from what was expected concerning reliabili- 
ties and variances. The logarithmic trans- 
formation produces a dramatically lower mean 
and reliability, but a larger variance. 

V/hy the hypotheses were not confirmed is 
a matter of conjecture. One plausible ex- 
planation was that the items proved not to be 
as difficult as had been anticipated. Thus, 
it was decided to repeat the experiment. 



STUDY NO. 2 

The second study, also conducted in Fall 
1967, used Eleventh Graders. For this study 
it was decided to increase the length of the 
test to 17 items (Appendix A), to decrease 
the testing time to 40 minutes, and to increase 
the Instruction for the treatment group to 40 
minutes by including practice ^n using the 
procedure on difficult mathematical items. 
Because of schedule difficulties, a matched, 



Table 1 

Results of Experiment with Twelfth Grade Students 



Control 




Treatment 




(Sum) 


(Sum) 


(Spherical) 


(Logarithmic) 


X =6.75 


X =6.80 


X =8.16 


X = 3.45 


r = .638 


r = .624 


r = . 51 


r = .43 


s 2 = 8.25 


s 2 = 5.63 


s 2 = 4.81 


s 2 = 13.77 


N = 32 


N = 32 


N = 32 


N = 32 


K = 15 


K = 15 


K = 15 


K = 15 


X = mean; 


r » reliability (Hoyt); s 2 


= variance; N = subjects; 


K = items 



4 



O 




Table 2 



Results of Experiment with Eleventh Grade Students 



Control 




Treatment 




(Sum) 


(Sum) 


(Spherical) 


(Logarithmic) 


X =4.00 


X =4.25 


X =6.33 


X =-1.78 


r = .185 


r = . 10 


r « -.02 


r = -.56 


s' = 3.36 


s 2 =1.75 


s 2 = 1. 56 


s' = 5.60 


N = 25 


N = 25 


N = 25 


N = 25 


K = 17 


K = 17 


K = 17 


K = 17 


X = mean; i 


= reliability (Hoyt); 


s z = variance; N = subjects; 


K = items 


rather than a random, 


sample was taken, 


which has no roots 


(response (e)) . Since the 



blocking on the same variables as before. 

The results of this study are summarized in 
Table 2. 

Again, the first hypothesis was only par- 
tially substantiated. For the Eleventh Grade 
treatment group the subjects used (1, 0) scor- 
ing 35% of the time and guessing (.2, .2, .2, 
.2, ,2) 33% of the time. Or, students were 
using a different strategyjonly 32% of the time. 

The second, third, and fourth hypotheses 
were again not supported by the data. As in 
Study No. I, the differences between the 
sum scores for the treatment and control 
groups were negligible. 



SUMMARY OF STUDIES NO. I AND NO. 2 

Why the hypotheses were not confirmed is 
not clear. One possibility is that the test 
instrument was not suitable to probability 
scoring. Even for these types of difficult 
items, students apparently attempt to arrive 
at answers by mathematical techniques and 
are willing to bet that their responses are cor- 
rect even though the techniques used often 
lead to wrong answers. 

For example, the typical way many students 
in the Eleventh and Twelfth Grades found a 
wTong answer to Problem 3 in Appendix A was 
to use, in solving a difficult problem, the tech- 
nique of first simplifying the algebraic ex- 
pression, Thus, the equation became 

(x + 1)(xN v 2)(xN s 3) = (ifN^2)(>fS^3)(x + 4) 
x + 1 = x + 4 



answer was reached using a mathematical 
method, and the response is one of the mul- 
tiple choices, the subject is certain that his 
answer is correct. In line with this, if the 
answer found is not one of the five alterna- 
tives, the student resorts to guessing. The 
data related to the first hypothesis somewhat 
substantiated this conjecture. 

Other possibilities are that these types of 
mathematical items do not lend themselves 
to easy elimination of alternatives, or that 
the treatment was not strong enough to con- 
vince students to use probability scoring 
more than they did. It may also be of import- 
ance to demonstrate In detail to the treatment 
group the admissible scoring transformation 
to be used. A better understanding of what 
the transformation will do to the weights as- 
signed could influence the way a subject 
scores the items. 

In conclusion, the problem of how one 
gets .useful, reliable information on difficult 
tests measuring high level cognitive abilities 
had not been solved. 



STUDY NO. 3 

In the two preceding experiments, a proba- 
bility measurement procedure was employed 
with a test consisting of very difficult, com- 
plex items which were designed to measure 
"insightful mathematical ability. " However, 
the probability measurement procedure failed 
to yield a higher reliability coefficient. As 
a further examination of the usefulness of 
probability measurement procedures, a third 
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study was designed using a test which mea- 
sures a low cognitive ability level. The 
purpose of this experiment was to investigate 
whether probability scoring used with a test, 
measuring knowledge of specific facts, would 
yield more reliable Information than conven- 
tional scoring procedures. 

An achievement test consisting of 30 mul- 
tiple-choice items was constructed from a 
pool of 74 items from a NLSMA battery of 
geometry items ( NLSMA, 1968, Report 2). 

In case the treatment for learning the 
probability scoring procedure had not been 
strong enough to produce the desired results 
In the two previous experiments, the treat- 
ment was strengthened. The treatment period 
was lengthened to 50 minutes of instruction 
and practice the day before the testing fol- 
lowed the next day by fifteen minutes of re- 
view and practice immediately before the 30- 
minute testing period. 

In the training period, as with the two 
previous experiments, a pamphlet, '‘Training 
for Probability Scoring" was handed out and 
discussed (Appendix B). An overlay similar 
to that used for the first two experiments was 
employed to demonstrate how to score items 
using the method (Appendix B). Also, two 
practice tests were used, one involving analo- 
gies from Lorge-Thorndlke Intelligence Test , 
Verbal Battery ( 1954) j the other, geometrical 
concepts not measured by the test used in 
the experiment (Appendix B). It was hypothe- 
sized that by having students score their own 
practice tests jslng the spherical scoring pro- 
cedure (Appendix B) that they would be more 
prone to be con zinced to use probability scor- 
ing procedures, rather than resort to their 
usual test-taking strategies. It was also de- 
cided to use a practice test consisting of 
Items very similar to the test to be given. 



Thus it was anticipated that the practice ses- 
sion would be similar to the testing se c <^on. 

for this study, four Eighth Grade classes 
taught by the same teacher at Madison Memo- 
rial Junior High School in Madison, Wisconsin, 
were used. The teacher identified two of the 
four classes as being high-mathematical 
achieving classes and two as low-mathematical 
achieving classes. By flipping a coin, one 
class from each of the above two pairs was as - 
signed to the control group (67 students) and 
the others to the treatment group (58 students). 
With respect to previous math grades the con- 
trol group had 14 in the A to B+ range, 31 in 
the B to C+ range, and 22 in the C to F range. 
The treatment group had 1 1 in the A to B+ range, 
26 in the B to C+ range, and 21 in the C to F 
range. The average IQ for the control group 
was 115. 4 and for the treatment group, 120. 5, 
The classroom teacher administered the train- 
ing session for the two classes in the treat- 
ment group and also the testing sessions. The 
results of this study are summarized in Table 3. 

Again, the results do not support the hy- 
potheses. For Hypothesis 1, there was some 
change in the test-taking style for the treat- 
ment group— 61% using (1, 0) scoring, 5% 
guessing (.2, .2, ,2, .2, .2) scoring, and 
34% using some other scoring scheme. The 
scoring using simple summing of the weights 
placed on the correct alternative yielded al- 
most exactly the same mean and reliability co- 
efficient as the control group. 

With respect to Hypothesis 2, the probability 
transformation measures applied to the treat- 
ment group yielded a lower reliability than the 
control group or the treatment group under 
simple summing. 

For Hypothesis 4, again the means are not 
ordered in the direction hypothesized, nor are 
the reliabilities ordered in the same manner as 
the means. 



Table 3 

Results of Experiment with Eighth Grade Students 



Control 

(Sum) 




Treatment 




(Sum) 


(Spherical) 


(Logarithmic) 


X =20.66 


X =20.58 


X =22.06 


X = 16.97 


r = .83 


r = .84 


r = .80 


r = .75 


s 2 = 26.02 


s 2 = 9.59 


s 2 = 14.82 


s 2 = 40.71 


N = 67 


N = 58 


00 

tl 

£ 


N = 58 


K = 30 


7* 

II 

O 


II 

o 


K = 30 


X = meanj 


r = reliability (Hoyt); 


s 2 = variance; N = subjects; 


K = items 



II 

SUMMARY 



It was felt at the conclusion of Study No. 1 
that the reasons that probability scoring did 
not increase the reliability coefficient v.as 
caused by the items not being difficult enough 
and the training in probability scoring not 
strong enough. However in Study No. 2, 
when the training period was Lengthened, 
practice given in scoring difficult mathemati- 
cal items Included, and the test made more 
difficult for the subjects, these changes still 
did not increase the reliability coefficients. 

At that time, the following possible explana- 
tions were raised: 

(1) The problem-solving set students em- 
ploy when trying to solve difficult mathe- 
matical problems does not allow proba- 
bility scoring procedures to be effective. 

(2) The training procedure was not effective. 

(3) Probability scoring procedures may not 
necessarily Increase the reliability co- 
efficient of a test. 

It was then decided to design a third ex- 
periment using mathematical items designed 
to test recall of information at a lower cogni- 
tive level. In the previous experiments the 
students had not been told the fadmis sible 
scoring procedure being used. For Study No. 

3 the training was lengthened to include teach- 
ing the subjects to use a spherical admissible 
scoring procedure. However, again the re- 
liability coefficient was not increased. It 
was anticipated that Experiment 3 would clar- 
ify the utility of the procedure. However, 
the reliability coefficient for the control group 
was quite high (. 83). Thus, the test reliabil- 
ity may have been too high to expect much of 



an increase by employing an admissible scor- 
ing procedure. [One should note that the 
| treatment group's reliability did decrease from 
. 84 under simple summing to . 80 under the 
spherical scoring procedure. ) 

While these studies have not eliminated 
any of the three alternative explanations of the 
results, increasing the time of the training 
would seem questionable in light of the cost- 
effectiveness factor in putting the probability 
scoring procedure into practice. One alterna- 
tive would be to use the commercial materials 
of Massengill and Shuford ( i-968) . These ma- 
terials employ a device which calculates the 
logarithmic scoring function. However, it is 
the opinion of these authors that using this 
device would, again, probably not appreciably 
increase the reliability of the tests used in 
these studies. 

Although the probability scoring procedures 
have not produced greater reliabilities in the 
studies reported here, the method certainly 
had definite assets, particularly concerning 
information about an individual's score. Im- 
mediately, if the subject reflects his true de- 
gree of belief, a teacher can tell if a student 
is misinformed (0 on correct alternative) or 
whether he is guessing (. 2, .2, ,2, .2, ,2) 
or whether he is correctly informed (1, or a 
number close to I on the correct alternative) 
on any particular item. This certainly is better 
than the traditional method of employing (1, 0) 
scoring. 

In conclusion, the three studies indicate 
that the problem of how one gets useful, 
reliable information on difficult tests has not 
been solved. 
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APPENDIX A 



Tests for Experiments 1, 2, and 3 




TEST 



(Items 1-15 used for Experiment 1 - Grade 12 




(E) It cannot be determined from the information given above 

2. Four Interior angles of a convex polygon are each right angles. Which o' the following state- 
ments applies to this polygon ? / 

(A) Some of the Interior angles must be acute 

(B) The polygon must be regular 

(C) The sum of the measures of all the interior angles may be arbitrarily large 

(D) The polygon must be a rectangle 

(E) None of the above 

3. Solve the equations; (x + 1) (x + 2)(x +3) = (x + 2)(x + 3)(x + 4) 

(A) -1, -2, -3 

(B) -2, -3 

(C) -2, -3, -4 

(D) -1, -2, -3, -4 

(E) The equation has no roots 

4. A club of 18 boys had a baseball team (9 players) and a football team (11 players). Five boys 
were on neither team. How many were on both of the teams ? 

(A) 2 

(B) 5 

(C) 7 

(D) 9 

(E) You cannot tell from the information given 

5. The numbers x for which 10 - x, 10, and 10 + x are the lengths of the sidts of a triangle 
are exactly the numbers x such that 

(A) | x | < 5 

(B) lx 1 > 5 
<C) I x | < 10 

(D) I x | > 10 

(E) | x | < 20 

6. The equation 2x^° + 5x - 1 = 0 has a root near zero. Of the following, which best approximates 
this root ? 

(A) -0.5 

(B) -0.2 

(C) 0. 1 

(D) 0.2 

(E) 0.5 



o 




/0/U 



s 



7. What Is the greatest possible distance between a point In the plane and a nearest point with 
Integer coordinates ? 

(A) 1 

\fZ 

(B) \ 

(C) sTz 

(D) \Ti 

<E) 2 

8. Find the largest value of x which satisfies the equation: 2(3 X ) + 4(8 X ) - 9 = 0. 

1 

m *3 
1 
2 
2 
3 
3 
2 



l.Sx + 7%01y = 25.503 ^ exactly (2, 3) but the solution of 




10 . 



11 . 



x + 5y = 17 
1 . 5x + 7 . 50 1 y 



= 25. 5 



is exactly (17,0). 



is: 

(A) 

(B) 

(C) 

(D) 

(E) 



The best explanation of why the above happens 

y m 



the constants have different degrees of accuracy 
the graphs of the equations are nearly parallel lines 
zero has many peculiar properties 
one should never round off 
a regular 17-sided polygon is constructive 

2 

y = x 



The graphs of the equations y = x and 
x = y + 3 split the plane into five areas > 
{See diagram. ) Which of these areas 
represent the points which satisfy both 
of the inequalities 

x - y - 3 > 0 



y - x > 0 ? 



and 


y 


(A) 


i 


(B) 


ii 


(C) 


I and II 


(D) 


III and IV 


(E) 


III and V 




The diagram at the right is not neces- 
sarily drawn to scale. The line segments 
at each vertex are perpendicular. Both 
a and b are whole numbers. The area of 
the figure is 13 square inches. What is 
the perimeter of the figure ? 



(A) 

(B) 

(C) 



18 inches 
20 inches 
34 inches 



(D) 40 inches 

(E) 42 inches 



12 
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12. Solve the inequality - > — 

x x + 1 



(A) 


All real 


numbers satisfy the 


(B) 


x > - 1 




(C) 


X < -1 




(D) 


x > 0 




t'E) 


X < -1 


or x > 0 



13. Which of the following is a sketch of the graph of |x| 



yl + 1 ? 




y 

A 



(D) 



-> x 



3 3 -^ 

14. Which of the following expressions is equivalent to: (49) x (64) x (56) ? 

(A) (49 .< 64 x 56) 27 

(B) (49 x 64 x 56) 3 

(C) (56) 3 

(D) -(49 x 64 x 56) 3 

(E) (56) 9 

15. Let "/a = x and \/b = x + 1. Which one of the following Is equal to 2x + 1 ? 

(A) \/a~+ b 

(B| 



(C) a + b 

(D) b-a 

(E) 7a 2 + b 2 



Find 


all integers 


(A) 


12 


(B) 


14 


(C) 


15 


(D) 


18 


(E) 


22 



n 



Problems 16 and 17 used for 

. .. . 2n + 1 _ 4n + 1 

;uch that < - — 

3 » I) 



Experiment 
3n ± 2 



2 only 

The sum of these integers 



is 



<?). 




13 



17. Which of the following values of 



x satisfies the equation 



2 

ax 



+ bx + 



c 



0 when a + b + c 



(A) 



b 

a 



(3) 



(C) 

(D) 



a + c 
b 

_ b 
a 



<E) 



c 

a 




Test - Experiment 3 - 8th Grade 



GEOMETRY TEST INSTRUCTIONS 



In this test you will be asked questions about different topics in geometry. Do not become 
discouraged if there are some questions you cannot answer. No one is expected to know about 
every topic. 

Although there will be some very hard questions, there are also some very eesy ones that you 
will certainly be able to answer correctly, and these are mixed in among the others. Read every 
question 1 

Here Is a sample question to show you how you should mark your answer. 

Example 0. If one angle of a triangle contains 90* , the triangle is called: 

(A) acute (D) isosceles 

y ( B) right (E) equilateral 

(C) obtuse 

The answer is B. See how letter B has been checked f or Example 0. 

You are to answer as many questions as you can. Do not spend too much time n any one 
question. You should guess only if you can rule out some of the choices. Do not guess wildly. 

*For these problems, you will mark each of your answers by checking one of the letters 
A, B, C, D, or E, You may use any space on the page for scratchwork. 

You will have 30 minutes to answer JO questions. DO NOT TURN THIS PAGE UNTIL YOU ARE 
TOLD TO DO SO. 



*The treatment group was told to ignore this. They were instructed to place a probability weight 
in the blank reflecting their degree of belief as to the correctness of the alternative. 
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1. The diagonals of a parallelogram must be 



8. The following figure illustrates a 



(A) mutually perpendicular 

(B) parallel 

(C) equal in length 

(D) bisectors of each other 

(E) oblique 

2. The geometric shape suggested by a can 
or a drinking straw is called a 

(A) sphere 

(B) cone 

(C) pyramid 

(D) cylinder 

(E) cube 

3. If the intersection of two different planes 
is not empty, then the intersection is 

(A) a point / 

(B) two different points 

(C) a line 

(D) two different lines 

(E) a plane 




4. How many vertices has the above polygon? 

(A) 3 

(B) 6 

(C) 9 

tj yi 15 

* (L, 24 



JA) prism 
,( B) cube 
jC) cone 
.(E)) pyramid 
(E) cylinder 




9. If two parallel lines are cut by a trans- 
versal, the alternate interior angles are 

(A) supplementary 

(B) complementary 
(C) acute 

(D) obtuse 

(E) congruent 



10. Jj^the figure below, if XY^ YZ and 

XZ is not congruent to "^Z, then AXYZ is 
Y 




(A) equiangular 

(B) scalene triangle 
{ C ) a right triangle 

(DJ an equilateral triangle 
(E) an isosce:*s triangle 



11. If the adjacent angles formed by two inter- 
secting lines have equal measures, the 
lines are 



5. 



6 . 



7. 



An equilateral triangle Is 

(A) obtuse 

(B) scalene 

(C) right 

(D) hyperbolic 

(E) equiangular 

If two lines are in the same plane, a line 
which Intersects them in two different points 
is called 

(A) a ray 

(B) an oblique line 

(C) a transversal 

(E^ a skew line 

( 1) a transit 

Which of the following Is true for this figure ? 

(A) I I m 

(B) I = m 

3 (C) I ~ m 

(D» I er m 

(9 t J_m 



(A) parallel 

(B) oblique 

(C) perpendicular 

(D) horizontal 

(E) vertical 

12. In which of the following figures are 
angles x and y adjacent? 
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13. Which of the figures below are parallelo- 
grams ? 



□ 

"•Z7 

,, Q 

(A) II and III only 

(B) I and II only 

(Q) I, II, and III only 

(D) II only 

(E) I, II, HI,and IV 

14. In the figure below, which angle is 
supplementary to ^XOZ ? 




W 



(A) quadrilateral 

{ B) pentagon 

(C) rectangle 

(D) hexagon 

(E) decagon 

16. Which of the following are true? 

I. A square is a rectangle 
II. A square is a rhombus 
HI. A square is a parallelogram 

(A) I and III only 

(B) II and III only 

(C) III only 

(D) I and II only 

(E) I, II,and III 

17. Which of the following figures repre- 
sents / || m ? 




111. Which of the angles below is the largest ? 



.(A) 




(A) 1 only 

(B) II only 

CO 111 only 

(DJ I and 11 only 
I E) 11 and 111 only 

IS. The following figure represents a 






18. (cont.) 



23, (cont.) 



.(E) 




(E) 




and 




19. The geometric shape suggested by a 
tennis ball or a globe Is called a 

(A) sphere 

(B) cone 

(C) pyramid 

{ D) cylinder 

(E) cube 

20. How many points has a straight line? 

(A) 1 

(B) 2 

<C) 5 

(D ) 17 

(E) More than can be counted 

21. Which one of the following has a differ- 
ent number of diagonals than the others 
listed ? 

(A) Rectangle 

(B) Rhombus 

(C) Trapezoid 

(D) Hexagon 

(E) Parallelogram 



24. In the figure below, /AOB and /BOC 
are 




(A) supplementary angles 

(B) complementary angles 

(C) both right angles 

(D) congruent angles 

(E) both obtuse angles 

25, In a trapezoid, one pair of sides must be 

(A) parallel 

(B) vertical 

(C) supplementary 

(D) congruent 

{ E) isosceles 



22 . 



23. 



All squares are 

(A) congruent 

(B) equal 

(C) similar 

(D) collateral 

(E) lsoperimetric 

Which of the following pairs of figures 
appears to be similar ? 



26. The sum of the measures in degrees of 
the angles of a triangle 

(A) is between 30 and 180 

* (B) is 180 

(C) is between 180 and 360 

(t* is 360 

( E) depends upon the sizes of the 

angles 



27. Which of the following figures represents 





and 


r 


a simple clos 


ed curve ? 








(A) 




and 




n 


oo 


(C) 

O 


and 


O 


(Cl 


c 


.m 


and 


4 


(9 


>0 




<E) 


a 



«*» 
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28. Two planes perpendicular to the same 
line are 



(ty perpendicular 
jB) oblique 

(C) intersecting 

(D) parallel 

(E) skew 



□ i 1 CD 



How many rectangles are shown above? 

(A) 2 

(B) 3 

(C) 4 

03) ? 

(E) 8 

30. Which of the following is the measure 
In degrees of an obtuse angle ? 

(A) 45 

(B) 90 

(C) 135 

(D) 225 

(E) Both C and D 



29 
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APPENDIX B 



Training for Probability Scoring, 
Experiments 1, 2, and 3 




TRAINING FOR PROBABILITY SCORING 
(Used in all three experiments,) 



The use of a probability scoring technique is somewhat different from the usual test taking 
strategy. 

Instead of choosing one of five alternatives in a multiple choice item, one puts probability 
weights between 0 and 1 on each alternative. A person is guaranteed a maximum score if he fol- 
lows instructions and honestly reflects his degree of belief as to the correctness of a possible 
answer to the test item. Your score is to be determined by summing the weights you assign to the 
correct answers. * 

Strategy; 

(1) If possible, work the problem using mathematical methods. 

(2) If you arrive at what you believe to be the correct answer, assign 1 to the correct 
answer and 0 to the other choices. 

(3) If you are not definite as to which of the alternatives is the correct choice, try to 
eliminate those which are definitely wrong. Assign 0 probability weights to these. Of 
the alternatives that could possibly be right, assign weights to these with regard to 
your belief in their correctness. The weights should sum to one. 

Strategies for assigning weights; 

(1) Do not waste a lot of time figuring out probabilities that add to 1. Keep the weights as 
simple as possible. Use.l, .2, .3, etc., as much as possible (don't use . 64, .16, .10, 
,07, and .03, for example]. 

(2) Assign 0 to definitely wrong alternatives and 1.00 to a definitely right one. If you have 
no idea which alternatives are correct or incorrect and your choice is a random guess, 
give each alternative a weight of . 2. 

(3) If 1 out of 5 alternatives is definitely wrong and the other four seem equally likely to be 
right, assign. 25, . 25, .25, . 25 to these alternatives (0, of course, to the wrong one). 

If 2 of 5 are definitely wrong and the other 3 seemingly equally likely to be right, assign 
. 33, . 33, . 33 to each of these, etc. 

(4) If one alternative seems more correct than another, be sure this is reflected by assigning 
a higher probability weight to it. 



* 



Except for Experiment 



3. 








OVERLAY PRACTICE SHEET 
(Experiments 1, 2, and 3) 



1 . The President of the U, S. is: 

a) Rusk b) Johnson 



c) Nixon d) Humphrey e) Romney 



2. The Governor of North Dakota is: 

a) George Wallace b) Nils Boe 

e) Ronald Reagan 

3 . Solve the equation: 5/n - 3/n = 1/4 

a) 8 b) 4 c) 2 



c) William Guy d) James Rhodes 



d) 1/2 e) 1/8 



4. The Premier of Israel is: 

a) David Ben Gurion b) Dayan c) Nassar d) Abba Eban 

e) Levi Eshkol 



5. The Prime Minister of Canada is: 
a) John Dlefenbaker 
d) John D. Rockefeller 



b) John Smith c) Lester Pearson 

e) Sir Walter Thomson 



6. The number of points common to a straight line and the sides of a triangle cannot be: 

a) 0 b) 1 c) 2 d) 3 e) infinite 



PRACTICE ITEMS 
Experiment 2 - 11th Grade 

1. If n + 20 is a multiple of 8, then when (if ever) is n + 10 a multiple of 4 ? 

(A) never 
(8) always 

(C) whenever n is even 

(D) whenever n is a multiple of 4 

(E) whenever n is a multiple of 8 



2, Which of the following equations has no rational root ? 



(A) x - - = 0 

X 

(B) x 2 - 1 =0 

(C) 2x + 3x = $x 
(W x 2 + x = 1 
<E> x 3 =■£ 

3. Arrange the areas P, 0, and R of the following shaded regions in Increasing order. 
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(A) P < Q < R 

(B) Q < P < R 

(C) P < R < Q 

(D) R < P < Q 

(E) Q < R £ P 



4. If the shaded region of the square pic- 
tured has an area between 60 and 70 
square inches and the unshaded area is 
between 75 and 85 square inches, the 
best estimate below of the length of a 
diagonal of the square is: 



(A) 12 Inches 

(B) 17 inches 

(C) 23 Inches 

(D) 29 Inches 

( E) 35 inches 




5. If x logfc 5 = logb 25, then x = ( ?). 
(A) 5 




(C) 2 

(D) log 20 

(E) the base b must be known before x can be determined. 



FIRST PRACTICE TEST 
Experiment 3 - 6th Grade 

INSTRUCTIONS 



Look at Sample Question 0. 

0. ROSE DAISY VIOLET 

A red B garden C sweet D grow E lily 

The words in question 0 are names of flowers. On the next line only Illy Is the name of a 
flower. The letter before Ujif is E s0 we check that blank. 

Now look at Question 00. Think in what way the words In Question 00 go together. Then find the 
word on the line below that belongs with them. 



RUN 


WALK 


MOVE 






A think 


B dream 


C march 


D sing 


E seem 



The right answer Is march . 



Wait for the signal to.begin. 
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1. BENCH SLAT STOOL 

A table B chair 

2. POTATO BEET PEA 

A nut B banana 



C desk 



D bed 



_C vegetable D dinner 



3. BOOK MAGAZINE LETTER 

A movie B newspaper C radio 

4. SHEEP PIG COW HORSE 

A dog B rabbit C deer 

5. PEEL RIND BARK SHELL 

A corn B orange C tree 



D lecture 



D wolf 



D husk 



7. MUSICIAN ACTOR HUMORIST SINGER 

A ventriloquist B professional C amateur 

E radio 

8. ALLEY ROAD DRIVE PATH 

A country B glade C passageway Dglen 



9* STAIRWAY LADDER STAIRS STAIRCASE 
h elevator B climb C hill _ 

10. HERD FLOCK SWARM DROVE 

A lair B den C bunch _ 

11. CAR CAB WAGON CART 

A train B carriage C vehicle _ 

12. PIN SAFETY PIN HOOK AND EYE ZIPPER 

A button B belt C strap ^ 

13. TIE CRAVAT STOCK NECKCLOTH 

A bib B collar C scarf _ 

14. HONESTY LOYALTY SINCERITY FAITHFULNESS 

A passivity B servility C devotion 

E compliance 



pack 



D motor 



D klrtle 



E sit 



E carrot 



E read 



E beaver 



E box 



6. DOLLAR PESO MARK LIRA 

A change B franc C foreign D purchase E bank 



m D program 



E lane 



_D escalator E grade 



E insects 



E tandem 



_D suspenders E garters 



^E girdle 



15. PINE SPRUCE HEMLOCK 
A chestnut B willow 



_C poflar 



D fir 



D obsequiousness 



E maple 
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SECOND PRACTICE TEST 
Experiment 3 - 8th Grade 



1, The accompanying figure 
shows a construction of a 



7. Which of the points, in the figure below, 
art in the exterior of angle ACE ? 



JA) mean proportional 
of two segments 

JB) perpendicular bi- 
sector of two 
segments 

JC) median of a 
triangle 

JD) diameter of a 
circle 

(E) tangent to a 
circular arc 




2, How many radii has a circle ? 

fA) 1 

(B) 3 

(C) 5 

(D) 9 

(E) More than can be counted. 

3, The abbreviation m/lrGH means 

(A) measure of angle FGH 

JB) metric arc FGH 

(C) m Is the midpoint of FGH 

m Is perpendicular to FGH 

(E) minor angle FGH 




(A; only B 

(B) only D 

(C> only B and F 

(Di only D and F 

(E) A, D, E, and F 

8, Which of the following is part of a circle? 
(A) Radius 

(B) Center 

(C) Arc 

(D) Chord 

(E) All of these 

9. The total length of a closed curve is called 



fc (A) apothem 

(B) area 

( C ) longitude 
slant height 

(E) perimeter 



4. Circles having the same center are called 



10* The axis of the cone shown below Is 



(A) congruent 

(B) asymmetric 

(C) concentric 

(D) corresponding 

C E) coincident 

5. In how many points do a circle and a line 
tangent to the circle intersect ? 

(A) None 

(B) One 

I (C) Two 

(D) At least two 

( E) Infinitely many 

6. In the figure below, which pair 

of angles are corresponding angles ? 

(A) 5, 8 

(B) 2, 8 

(C) 4, 5 

<« 1 * 2 

<t) 3, 7 





(A) point P 
JB) point a 

(C) segment PR 

(D) segment PQ 

(E) the circle with center R 
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experiment 3 



Spheri cal Transformation Scoring Sheet 
(Given to Students) 

PROBABILITY WEIGHTS ON THE FIVE CHOICES 





A 


A 


A 


A 


Actual 




» 


B 


B 


B 


Score 




C 


C 


C 


C 


One 




D 


D 


D 


D 


Receives 


Correct 


or 


or 


or 


or 


for 


Choice 


E 


E 


E 


E 


Item 


1 


0 


0 


0 


0 


1.00 


.5 


.5 


0 


0 


0 


.71 


c? .33 


.33 


.33 


0 


0 


. 58 


Z) • 25 


.25 


.25 


.25 


0 


,50 


a o> .2 


.2 


.2 


.? 


.2 


,45 


3 o 

:« 


.1 

.2 








.99 

.97 


.8 


.1 


.1 






.98 


.7 


j 3 








.92 


.7 


.2 


• 1 






.96 


.7 


. 1 


■ X 


.1 




.97 


.6 


.4 








.83 


.6 


.3 


J 1 






.88 


.6 


.2 


.2 






.90 


.6 


,2 


• 1 


.1 




.93 


.6 


.1 


• X 


.1 


.i 


.95 


.5 


... if 


1 X 






.79 


. 5 


.3 


. 2 






.81 


.5 


.3 


i l 


.1 




.84 


.5 


.2 


.2 


.1 




.66 


.5 


.2 


- • x 


.1 


.i 


.88 


.4 


.6 








.56 


.4 


.5 


* X 






.62 


.4 


.4 


.2 






.67 


i* 


ti 


Li 


J 




til 
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Spherical Transformation Scoring Sheet 



PROBABILITY WEIGHTS OH THE E1VE CHOICES 





A 


A 


A 


A 


Actual 




B 


B 


B 


B 


Score 




C 


C 


C 


C 


One 




D 


D 


D 


D 


Receives 


Correct 


or 


or 


or 


or 


for 


Choice 


E 


E 


E 


E 


Item 


.4 


. 3 


.3 


.0 


.0 


.69 


.4 


. 3 


.2 


. 1 


.0 


.73 


. 4 


.2 


• 2 


.2 


.0 


.76 


.3 


.7 


• 0 


.0 


.0 


.40 


. 3 


.6 


. i 






. 59 


. 3 


.4 


. i 


. 1 


.1 


. 57 


. 3 


.4 


. 3 






. 52 


. 3 


.3 


.2 


.2 




l59 


.3 


.2 


♦ 2 


. 2 


. 1 


.64 


,2 


,8 








.24 


.2 


.7 


. i 






.29 


.2 


.6 


.2 






.30 


.2 


.6 


.1 


. i 




.33 


,2 


. 5 


.2 


. i 




. 35 


• 2 


.4 


.2 


.2 




. 31 


.2 


.4 


.2 


,1 


. 1 


.39. 


• 2 


.3 


.3 


.2 




. 39 


.2 


.3 


.3 


. 1 


. 1 


. 47 


.2 


.3 


.2 


.2 


. 1 


.43 


. 1 


- • 9 








. 11 


. 1 


.8 


. 1 






. 12 


.1 


.7 


.2 






. 14 


. 1 


. 5 


• 2 


. 1 


. 1 


. 18 


. 1 


.4 


.4 


.1 .. 




. 18 


. 1 


.6 


. i 


. 1 


. 1 


. 16 


.1 


.7 


. i 


. 1 




. 14 




. 3 


.3 


.2 


. 1 


.20 


1 1 


.3 


.3 


. 3 




.19 


• 1 


.4 


.2 


.2 


. 1 


.20 


.0 


• a 


.b 


. c 


• d 


.00 
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